Option compatible reward inverse reinforcement learning
نویسندگان
چکیده
Reinforcement learning in complex environments is a challenging problem. In particular, the success of reinforcement algorithms depends on well-designed reward function. Inverse (IRL) solves problem recovering functions from expert demonstrations. this paper, we solve hierarchical inverse within options framework, which allows us to utilize intrinsic motivation A gradient method for parametrized used deduce defining equation Q-feature space, leads feature space. Using second-order optimality condition option parameters, an optimal function selected. Experimental results both discrete and continuous domains confirm that our recovered rewards provide solution IRL using temporal abstraction, turn are effective accelerating transfer tasks. We also show robust noises contained
منابع مشابه
Compatible Reward Inverse Reinforcement Learning
PROBLEM • Inverse Reinforcement Learning (IRL) problem: recover a reward function explaining a set of expert’s demonstrations. • Advantages of IRL over Behavioral Cloning (BC): – Transferability of the reward. • Issues with some IRL methods: – How to build the features for the reward function? – How to select a reward function among all the optimal ones? – What if no access to the environment? ...
متن کاملActive Learning for Reward Estimation in Inverse Reinforcement Learning
Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at “arbitrary” ...
متن کاملInverse Reinforcement Learning with Locally Consistent Reward Functions
Existing inverse reinforcement learning (IRL) algorithms have assumed each expert’s demonstrated trajectory to be produced by only a single reward function. This paper presents a novel generalization of the IRL problem that allows each trajectory to be generated by multiple locally consistent reward functions, hence catering to more realistic and complex experts’ behaviors. Solving our generali...
متن کاملNonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions
We present a nonparametric Bayesian approach to inverse reinforcement learning (IRL) for multiple reward functions. Most previous IRL algorithms assume that the behaviour data is obtained from an agent who is optimizing a single reward function, but this assumption is hard to guarantee in practice. Our approach is based on integrating the Dirichlet process mixture model into Bayesian IRL. We pr...
متن کاملRepeated Inverse Reinforcement Learning
We introduce a novel repeated Inverse Reinforcement Learning problem: the agent has to act on behalf of a human in a sequence of tasks and wishes to minimize the number of tasks that it surprises the human by acting suboptimally with respect to how the human would have acted. Each time the human is surprised, the agent is provided a demonstration of the desired behavior by the human. We formali...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Pattern Recognition Letters
سال: 2022
ISSN: ['1872-7344', '0167-8655']
DOI: https://doi.org/10.1016/j.patrec.2022.01.016